1,166 research outputs found

    PredNet and Predictive Coding: A Critical Review

    Full text link
    PredNet, a deep predictive coding network developed by Lotter et al., combines a biologically inspired architecture based on the propagation of prediction error with self-supervised representation learning in video. While the architecture has drawn a lot of attention and various extensions of the model exist, there is a lack of a critical analysis. We fill in the gap by evaluating PredNet both as an implementation of the predictive coding theory and as a self-supervised video prediction model using a challenging video action classification dataset. We design an extended model to test if conditioning future frame predictions on the action class of the video improves the model performance. We show that PredNet does not yet completely follow the principles of predictive coding. The proposed top-down conditioning leads to a performance gain on synthetic data, but does not scale up to the more complex real-world action classification dataset. Our analysis is aimed at guiding future research on similar architectures based on the predictive coding theory

    Planning Interdisciplinary Artificial Intelligence Courses For Engineering Students

    Get PDF
    As Artificial Intelligence (AI) becomes increasingly important in engineering, instructors need to incorporate AI concepts into their subject-specific courses. However, many teachers may lack the expertise to do so effectively or don’t know where to start. To address this challenge, we have developed the AI Course Design Planning Framework to help instructors structure their teaching of domain-specific AI skills. This workshop aimed to equip participants with an understanding of the framework and its application to their courses. The workshop was designed for instructors in engineering education who are interested in interdisciplinary teaching and teaching about AI in the context of their domain. Throughout the workshop, participants worked hands-on in groups with the framework, applied it to their intended courses and reflected on the use. The workshop revealed challenges in defining domain-specific AI use cases and assessing learners\u27 skills and instructors\u27 competencies. At the same time, participants found the framework effective in early course development. Overall, the results of the workshop highlight the need for AI integration in engineering education and equipping educators with effective tools and training. It is clear that further efforts are needed to fully embrace AI in engineering education

    PredProp: Bidirectional Stochastic Optimization with Precision Weighted Predictive Coding

    Full text link
    We present PredProp, a method for bidirectional, parallel and local optimisation of weights, activities and precision in neural networks. PredProp jointly addresses inference and learning, scales learning rates dynamically and weights gradients by the curvature of the loss function by optimizing prediction error precision. PredProp optimizes network parameters with Stochastic Gradient Descent and error forward propagation based strictly on prediction errors and variables locally available to each layer. Neighboring layers optimise shared activity variables so that prediction errors can propagate forward in the network, while predictions propagate backwards. This process minimises the negative Free Energy, or evidence lower bound of the entire network. We show that networks trained with PredProp resemble gradient based predictive coding when the number of weights between neighboring activity variables is one. In contrast to related work, PredProp generalizes towards backward connections of arbitrary depth and optimizes precision for any deep network architecture. Due to the analogy between prediction error precision and the Fisher information for each layer, PredProp implements a form of Natural Gradient Descent. When optimizing DNN models, layer-wise PredProp renders the model a bidirectional predictive coding network. Alternatively DNNs can parameterize the weights between two activity variables. We evaluate PredProp for dense DNNs on simple inference, learning and combined tasks. We show that, without an explicit sampling step in the network, PredProp implements a form of variational inference that allows to learn disentangled embeddings from low amounts of data and leave evaluation on more complex tasks and datasets to future work

    Transfer Learning for Speech Recognition on a Budget

    Full text link
    End-to-end training of automated speech recognition (ASR) systems requires massive data and compute resources. We explore transfer learning based on model adaptation as an approach for training ASR models under constrained GPU memory, throughput and training data. We conduct several systematic experiments adapting a Wav2Letter convolutional neural network originally trained for English ASR to the German language. We show that this technique allows faster training on consumer-grade resources while requiring less training data in order to achieve the same accuracy, thereby lowering the cost of training ASR models in other languages. Model introspection revealed that small adaptations to the network's weights were sufficient for good performance, especially for inner layers.Comment: Accepted for 2nd ACL Workshop on Representation Learning for NL

    PI-RADS v2 Compliant Automated Segmentation of Prostate Zones Using co-training Motivated Multi-task Dual-Path CNN

    Full text link
    The detailed images produced by Magnetic Resonance Imaging (MRI) provide life-critical information for the diagnosis and treatment of prostate cancer. To provide standardized acquisition, interpretation and usage of the complex MRI images, the PI-RADS v2 guideline was proposed. An automated segmentation following the guideline facilitates consistent and precise lesion detection, staging and treatment. The guideline recommends a division of the prostate into four zones, PZ (peripheral zone), TZ (transition zone), DPU (distal prostatic urethra) and AFS (anterior fibromuscular stroma). Not every zone shares a boundary with the others and is present in every slice. Further, the representations captured by a single model might not suffice for all zones. This motivated us to design a dual-branch convolutional neural network (CNN), where each branch captures the representations of the connected zones separately. Further, the representations from different branches act complementary to each other at the second stage of training, where they are fine-tuned through an unsupervised loss. The loss penalises the difference in predictions from the two branches for the same class. We also incorporate multi-task learning in our framework to further improve the segmentation accuracy. The proposed approach improves the segmentation accuracy of the baseline (mean absolute symmetric distance) by 7.56%, 11.00%, 58.43% and 19.67% for PZ, TZ, DPU and AFS zones respectively.Comment: Authors Arnab Das and Suhita Ghosh contributed equally. Submitted in ISBI 202

    A Systematic Comparison of Music Similarity Adaptation Approaches

    Get PDF
    In order to support individual user perspectives and different retrieval tasks, music similarity can no longer be considered as a static element of Music Information Retrieval (MIR) systems. Various approaches have been proposed recently that allow dynamic adaptation of music similarity measures. This paper provides a systematic comparison of algorithms for metric learning and higher-level facet distance weighting on the MagnaTagATune dataset. A crossvalidation variant taking into account clip availability is presented. Applied on user generated similarity data, its effect on adaptation performance is analyzed. Special attention is paid to the amount of training data necessary for making similarity predictions on unknown data, the number of model parameters and the amount of information available about the music itself. 1

    Improving Voice Conversion for Dissimilar Speakers Using Perceptual Losses

    Full text link
    The rising trend of using voice as a means of interacting with smart devices has sparked worries over the protection of users' privacy and data security. These concerns have become more pressing, especially after the European Union's adoption of the General Data Protection Regulation (GDPR). The information contained in an utterance encompasses critical personal details about the speaker, such as their age, gender, socio-cultural origins and more. If there is a security breach and the data is compromised, attackers may utilise the speech data to circumvent the speaker verification systems or imitate authorised users. Therefore, it is pertinent to anonymise the speech data before being shared across devices, such that the source speaker of the utterance cannot be traced. Voice conversion (VC) can be used to achieve speech anonymisation, which involves altering the speaker's characteristics while preserving the linguistic content.Comment: Accepted in The German Annual Conference on Acoustics 2023 (DAGA
    • …
    corecore